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Abstract: A set of curves or images of similar shape is an increasingly 
common functional data set collected in the sciences. Principal Component 
Analysis (PCA) is the most widely used technique to decompose varia- 
tion in functional data. However, the linear modes of variation found by 
PCA are not always interpretable by the experimenters. In addition, the 
modes of variation of interest to the experimenter are not always linear. 
We present in this paper a new analysis of variance for Functional Data. 
Our method was motivated by decomposing the variation in the data into 
predetermined and interpretable directions (i.e. modes) of interest. Since 
some of these modes could be nonlinear, we develop a new defined ratio 
of sums of squares which takes into account the curvature of the space of 
variation. We discuss, in the general case, consistency of our estimates of 
variation, using mathematical tools from differential geometry and shape 
statistics. We successfully applied our method to a motivating example of 
biological data. This decomposition allows biologists to compare the preva- 
lence of different genetic tradeoffs in a population and to quantify the effect 
of selection on evolution. 
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1. Introduction 

Researchers in an increasing number of fields, including biology, medicine, and 
engineering, collect samples of curves, images or objects of common shape. 
Growth curves of plants and animals, gene expression signals, medical images, 
and human speech are all real life examples o f high dimensional multivar i- 
ate or functional data, see grvden and Mardial (|l998f ): iFletcher et all (|2004h ; 



Ramsav and Silverman ( 20051 ). Understanding the variation in this type of data 



set is usually of primary interest. Principal Component Analysis (PCA) is the 
most widely used method to decompose variation in functional data. However, 
the directions of variation found by PCA arc data driven and linear. So, prin- 
cipal directions are not always easily intcrpretable by the experimenters and the 
decomposition fails in the presence of nonlinear variation. The analysis presented 
in this paper decomposes the variation in terms of modes that are predetermined 
by the experimenters. Since some of these modes could be nonlinear, we develop 
a new method, Template Mode of Variation or TMV, to decompose variation 
along nonlinear modes. This decomposition problem is particularly challenging 
because of the complexity of non-Euclidean geometry Our method was applied 
successfully to several sets of reaction nor m curves in evolu tionary biology, ther 



mal performance curves of caterpillars in llzeml |2004j) and llzem and Kingsolver 
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Fig 1. Thermal Performance Curves of Caterpillars. Each curve represents the family growth 
rate z at six temperatures t (in Celsius). The between curve variation represents the genetic 
variation in the population. 



(|2005l) . Hies in llzeml (|2004j ) and viruses in lKnies et al. (2006). The matlab code 
for this decomposition is available at http://www.fas.harvard.edu/~rizcm. The 
method that we present here could be generalized to decompose predetermined 
modes in any set of curves or images and along directions of interest that satisfy 
our assumptions. 

The data and problem which motivated our analysis is presented in Subsec- 
tion 11.11 We discuss the main assumptions of our analysis in Subsection 11.21 
We contrast our approach to other approaches in the literature for analysis 
of nonlinear variation in Functional Data Analysis (FDA), Shape Analysis, or 
Manifold Learning in Subsection 11.31 Finally, we summarize the content of the 
paper in Subsection II. 41 



1.1. Motivating data and problem 

The data which motivated the analysis in this paper comes from evolutionary 
biology and is shown in Fig. [TJ Each curve shows the growth rate as a function 
of temperature for a given family of caterpillars, where a family is a set of 
offsprings of same parents. This data is an example of reaction norm curves, 
a widely collected data set in biology, where each curve show s the change of a 
trait as a function of the environment for some genotype, see Scheiner 1 19931 ). 



Thus, in our example, the growth rate is the trait of interest, the temperature 
is the environmental condition of interest, and each family represents a different 
genotype in the population. Other exam ples of reacti o n nor m curves are growth 
of a virus as a function of temperature in Knies et al.l ( 2006h and photosynthesis 



of a plant as a function of light intensity or nutrients in the soil. 

The curves' variation represents the genetic variation in the population for 
different environmental conditions. Three gene-environment interactions were 
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identified by biologists as being of particular intere s t: hor izontal shift, vertical 



shift and generalist-specialist, see iKingsolver et alj (|200ll ). These three varia 



tions are shown in Fig. [2] The first mode of variation, the vertical shift, corre- 
sponds to a population where some genes clearly dominate other genes in all 
environments. The two other modes of variation, the horizontal shift and the 
generalist-specialist, show two different gene-environment tr ade-offs where all 
genes are better in some environments and worse in others. In IKingsolver et al.l 



1 20011 ). the given hypothesis is that all these directions exist simultaneously, 
and there is a need for decomposing the genetic variation in the data into these 
modes of interest to see which directions are more important. 

PCA is one of the most commonly used method to decompose varia t ion in 



functional data. PCA was applied to the data set in IKingsolver et alJ (|2004l ) 



but it failed to answer the biological questions of interest. In particular, the 
paper found that the first linear principal directions were difficult to interpret 
biologically and could not distinguish the horizontal shift from the generalist- 
specialist. PCA failed to give biologically meaningful results in this data set 
because: first, PCA decomposition is data-driven rather than hypothesis driven; 
second, PCA can only find linear directions of interest and some of the directions 
of interest above are nonlinear. 

The method described in this paper was successfully applied to decompose 
the variation in the data shown in Fig. [1] onto the three modes of variation of 
interest to biologists represented in Fig. [21 Our results arc presented in Section[7] 
and show that the model fits 84% of the variation in the data, most of the 
variation (73%) is explained by the two nonlinear modes and the remaining 
11% is explained by the, linear, vertical shift. 

1.2. Common shape and predetermined directions 

Our analysis assumes that functional data sets have common shape and that 
the modes of variation of interest are predetermined and parameterizable. 

In the caterpillar example shown in Fig. [TJ we see that each curve has a 
common shape. Each curve increases slowly, tends to reach a maximum and 
finally decreases rapidly. This common shape assumption is meaningful for sev- 
eral examples in functional data. Because functional data represent different 
realizations of the same underlying process, growth rate in our example, it is of- 
ten considered t hat the variation is around a common template shape re f lectin g 
that process, s ee Wang and Gassed (|T9^ll999( l: lRamsay and Silverman! (|2005! ) : 



Drvden and Mardial (|l998h . In practise, it is often easy to determine the common 



shape from simply looking at the data. For example: a template gene expression 
signal could be periodic in time with each period corresponding to a cell cycle, 
and a template growth curve over time of a plant could be a logistic function. 
However, the term common shape is not easily formally defined. In this paper, 
the term of common shape or template shape will mean the shape in the center 
of the variation in the data. 

There are often some directions of interest identified by the experimenters 
around this common template shape which we call modes of variation. This 
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Fig 2. Three modes of variation of biological interest. Top: Vertical shift of curves: Zi(t) = 
z(t) + hi, where z(t) is the common shape curve and hi is the height of family i. Middle: 
Horizontal shift of curves: Zj(t) = z(t — mi), where z(t) is the common shape curve and 
m,i is the location of maximum of family i. Bottom: Generalist-Specialist variation: Zi(t) = 
Wiz(wit), where z is the common shape curve and Wi is the width of the curve of family i. 



term was used early on, as in ICastro et al.l (1986:). When these directions are 
predetermined, the variation in the data is then fully described by the template 
shape and the modes of variation around the template shape. In our exam- 
ple, we const r ucted the following 3-parameter Shape Invariant Model (SIM) 



Lawton et al.l (|1972n to model the three modes of variation of interest 



Zi(tj) = Wiz(wi(tj - mi)) + hi + eij, 



(1.1) 
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where Ziitj) is the growth rate of family i at temperature tj. The function 
z represents the common shape. Parameters (hi,mi,Wi) represent the vertical, 
horizontal and gencralist-specialist variation of family i from the template shape. 
The vertical shift is a linear mode of variation, however since z is not a linear 
function, the modes of variation parameterized by m and w are nonlinear. Note 
that the three parameters are on different scales, h is on the growth rate scale, 
m on the temperature scale and w is unit-less. Thus, it is difficult to compare 
the magnitude of the variance of different parameters. In addition, it is difficult 
to quantify the contribution of the variation of each parameter to the between 
growth curve variation. Our method is a nonlinear ANOVA type decomposition 
of the variation in growth rate induced by the variation of the parameters. Be- 
cause our decomposition is done in the same scale as the data, the contributions 
of the parameters to the total variation in the data are comparable. 

The 3-parameter SIM is a particular case of a genera l Self Modeling Nonlinear 



Regression (SEMOR) model iKneip and Gasseri (|1988l ) 



Ziitj) = fl(Mi) + Cj, 1 < * < n; 1 <j < d (1.2) 

where the response Zi varies nonlinear ly as a function of t S K, where Bi 6 W l 
is the vector of parameters of variation, and where R is the common regression 
function. In the caterpillar example, 9i is the vector (iUj,mj, hi), and R(9i,t) = 
Wiz(wi(t — m,-)) + hi. When R is linear in 9i the modes of variation are all 
linear. Other examples of modes are: variation in frequency and duration of 
speech signals, or variation in asymptotic height and growth rate of a logistic 
growth curve of plants. Definitions and results in this paper will be formulated 
in terms of the general model (|1.2| . Theoretical results will also assume that 
the common shape R is known. We discuss how to estimate R from the data 
using a procrustes type method in Section [5] 



1.3. Our method and nonlinear variation in the literature 



The importance of nonlinear variation of curves around a common shape, such 
as registration, or horizontal shi ft, has been recognized f or some tim e in the 
FDA literature, see Chapter 5 in iRamsav and Silverman! (|2002l , I2005T ). SIM or 
SEMOR models are also a common way to account for nonlinearity in FDA , 



some examples of t he ex t ensive literature in the topi c arelLawton et al 



Kneip and Gasser dl988l): Hardle and Marronl ( 1990 ); Kneip and Enge^ 
Lindstroml(|l995fklLadd and Lindstroml (j200ol ); lBrumback and Lindstroml (|20o3) 



1972 ) 
1995) 



However, most registration methods which account for nonlinearity of the data 
have treated nonlinearity as a nuisance. Th ese methods remov e nonlinear m odes 
to better estimate the common curve as in IWang and Gasseri (I1997L Il999h. and 
then to apply conventional linear techniques such as PCA in ISilvermanl (| 19951 ). 
In contrast, motivated by the goals of our collaborators, our premise is that 
nonlinear variation should be part of the decomposition. We believe that when 
nonlinear variation is itself of main interest, as is the case in some of the examples 
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cited above, these modes should be properly accounted for in the decomposi- 
tion. In SIM models, the focus has been on estimati ng the parameters a nd th e 
common shape curve in a n optimal way, suc h as in iKneip and Gasser (Il988l) : 
Hardle and Marron (|l990h ; IKneip and Engell (|l995h and quantifyin g the vari- 
ability by assuming random parameters a nd estimating their variance iLindstrom 
( 19951 ) : lBrumback and Lindstroml (|2004 ). The variance of each parameter in the 
SIM model can be used to quantify the variation in the data along a mode. How- 
ever, as noted in Subsection 11.21 the parameters are on different scales which 
makes it difficult to compare the contribution of each parameter to the total 
variation in the data. In addition, the paramctrization might not be unique and 
different parametric scales would have different variance values. In this paper, 
instead of using the variance of the parameters to quantify variability, we de- 
compose the variation in the natural or intrinsic scale of the data in a nonlinear 
ANOVA type decomposition. The results of our decomposition are not tied to 
the parametrization we chose since an equivalent parametrization would produce 
the same space of variation. Thus, an equivalent parametrization of the modes 
of variation will produce the same decomposition. Our decomposition is also 
invariant to a linear transformation of the data, since a linear transformation of 
the data will be absorbed by the common shape function. 

In Sha pe Analy s is, the notion of mean and variance in a metric manifold were 
defined in iFrechen (1948) as the Frechet mean and Frechet variance. These no- 
tions are widely studied in the field of robust statistics and have more recently 
been used by proba bilist s for c haracterizing distributions of shape data, se e 
Kume and Lei (|2000h ; lie] <|200lh : iBhattacharva and Patrangenarul (|2003l . l2005h . 



These authors showed fundamental results for the Frechet mean, conditions of 
its existence, consistency of its estimates from the sample, and a central limit 
theorem for the Frechet mean on a Riemannian manifold. Our focus in this pa- 
per is not in estimating the Frechet mean, but in decomposing variability. One 
drawback of the Frechet variance is that it is a number regardless of the dimen- 
sionality of the manifold. It represents the total variation in the data, but it docs 
not offer the full understanding of the variation that the variance-covariance ma- 
trix offers in the Euclidean case. In our approach, we decompose the Frechet 
variance SSM into meaningful quantities, each representing variation along a 
mode of interest, i.e. 



SSM = J2^SMi 



where SSMi quantifies the variation along mode i. This goal is reached by 
defining new metrics which allow for the decomposition of the variation. More 
precisely, we define a new dy in the manifold such that, 



SSM = Y J dv(R(8i,t),R) 



where R is a Frechet mean. In the one dimensional case, the metric is sim- 
ply the arcdistance or shortest distance between two points along the curved 
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one dimensional diffcrentiablc manifold. In the two dimensional case we use a 
Pythagorean like formula to define a path metric in the manifold as a function 
of the arcdistances along each mode. Contrary to the Euclidean case, the path 
metric between two points along one-dimensional geodesies are not unique. So, 
our final distance accounts for this multiplicity by assigning weights for each 
path metric. 

Ma nifo ld learning methods a s in Hastie and Stuetzlel (1 198S )± Tenenbaum et al 



d200d): iDonoho and Grimes! (l2003h: Ide Silva and Tenenbaum! (j2003al ); 



Ide Silva and Tenenbaum ( 2003b ); Fletcher et al.l ( 2004 ) extend the dimension- 
ality reduction aspect of PCA to multivariate data lying in a nonlinear manifold. 
They are data-driven and exploratory, so the results they find are not always 
interpretable. Moreover, these methods do not attempt to quantify the vari- 
ation along given nonlinear principal directions. In contrast, in our approach, 
we decompose the variation into pre-identified, and interpretable, modes given 
by the experimenter. Lastly, our decomposition exploits the fact that the data 
of interest to us are not arbitrary multivariate data but instead are functional 
data, with approximately common shape. 



1.4- Structure of this paper 



We illustrate the geometry of nonlinear spaces of variation with four toy exam- 
ples in Section [2j Two toy examples illustrate one dimensional spaces of varia- 
tion, and two other toy examples illustrate two dimensional spaces of variation. 
In Section [3] we define the Frechet mean and variance for functional random 
variables varying in a manifold. We also propose a ratio measuring the variation 
along nonlinear modes. Section 3] discusses our choice of metrics in the manifold 
and the proposed quantification of one, two or multiple simultaneous modes of 
variation. In the one-dimensional case, the metric is simply the arcdistancc. In 
the two-dimensional case and higher, the choice of metric is key to decomposing 
variability. The metric in higher dimensions is defined by a Pythagorean like 
formula using all the one dimensional arcdistances. The consistency of our esti- 
mates is shown in Section [5] We discuss the implementation of this method as 
well as present results of our method on the motivating example in Section [6l 
We finally discuss the result of the decomposition to the motivating data set in 
Section [3 



2. Geometry of the space of variation 

Before considering the space of variation in the most general case in model (|1.2[) , 
we explore four particular examples with a given common shape. The first two 
examples are one mode cases, the two other examples are two-mode cases. The 
one mode cases illustrate what the geometry of the space of variation is for a 
linear and a nonlinear mode, they are: 

(a) Linear mode example: vertical shift of curves around a common shape. In 
this example, 6{ = h i} and R(9i,t) = z(t) + hi in model p. 21) . 
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(b) Nonlinear mode example: horizontal shift of curves around a common 
shape. In this example, 6i = and R(9i,t) = z(t — irij) in model Q1.2p . 

The two modes examples are: 

(c) Linearly separable model example: simultaneous variation of curves along 
the vertical shift and horizontal shift. In this example 0, = (hi, mi), and 
R(0i,t) = z(t — mi) + hi in model (|1.2[) . We call this model linearly sep- 
arable because we can write the function R((nii, hi), t) as the sum of two 
different functions, each depending on only one parameter. More precisely, 
R((rrii,hi),t) — Ri(mi,t) + i?2(/ii,t) where R\(mi,t) = z(t — m,) and 
R2{hi,t) = hi. As we will see in Section HJ the geometry of the space of 
variation and the decomposition will be simpler to describe in this case 
because of this property. 

(d) General case example: simultaneous variation of curves along the hori- 
zontal shift and generalist-specialist. In this example, 6i = (irii,Wi), and 
R(0i, t) = Wiz(wi(t — mi)) in model (|1.2|) . As we will see in Section |4] defin- 
ing a distance that allows the decomposition will be less trivial in this case, 
but will rely on similar ideas defined for one-mode and two-mode linearly 
separable examples. 

For illustration, the curve variation as well as the space of variation is shown for 
each of these four examples with a parabola common shape in Fig.[3]and Fig.|U 
These four examples and their illustrations are discussed in details in Subsection 
12.11 In Subsection 12.21 we present the general result on the dimensionality and 
geometry of the space of variation for a set of curves of common shape. 

2.1. Examples in one and two dimensions 

To visualize the variation in the curve space and in the point cloud space, we 
sample z at three distinct points (ti,t 2 ,t 3 ). As illustrated in Fig.s [3] and [U 
sampling at three points allows for a representation of an infinite dimensional 
curve as a point in three dimensions. So, a set of parabolic curves in the curve 
space appears as a set of points in the point cloud space and the variation in 
the curves corresponds to variation of the points. 

The linear mode example, the vertical shift, is presented in the two left panels 
of Fig. [3l We see that when there is one mode of variation of the curves and 
it is linear, here vertical shift, the points in the point cloud space fall along a 
line. We call the line in this example the space of variation of the data. Note 
that when the mode of variation is linear, the arithmetic average falls within 
the space of variation. Note also that deviation of each data point from the 
mean could be measured by the Euclidean metric. The nonlinear mode example 
is presented in the two right panels of Fig. [3] We see that when there is one 
mode of variation of the curves and it is nonlinear, here the horizontal shift, 
the points in the point cloud space fall along a curve. We call this curve in 
this example the space of variation of the data. Note that when the mode of 
variation is nonlinear, the arithmetic mean will not fall in the space of variation. 
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Horizontal Shift 




Fig 3. Illustration of the vertical shift and horizontal shift modes in the curve space and the 
point cloud space. Two examples of one dimensional mode of variation, one linear and one 
nonlinear. From left to right, Panel 1: Parabolas vertically shifted. Panel 2: Visualization of 
the variation in 3d. Since the mode of variation is linear, the space of variation is a line and 
the mean (x) lies in the space of variation. Panel 3: Parabolas horizontally shifted. Panel 4'- 
Visualization of the variation in 3d. Since the mode of variation is not linear, the space of 
variation is a 1-dim manifold or curve. 




Fig 4. Curve space, and space of variation in the two dimensional case. From left to right, 
Panel 1: Toy parabola curves varying simultaneously along the vertical shift (parameterized 
by h) and the horizontal shift (parameterized by m). Panel 2: The space of variation cor- 
responding to Panel 1. It is a surface. Panel 3: Toy parabola curves varying simultaneously 
along the generalist- specialist (parameterized by w) and horizontal shift (parameterized by 
m). Panel 4 : The space of variation corresponding to Panel 3. It is a surface. Each curve in 
the curve spaces in Panel 1 and Panel 3 corresponds to a point in the space of variation in 
Panel 2 and Panel 4- 
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Similarly, using the Euclidean distance will not be the best characterization of 
deviation along the space of variation. For the one mode case, we will propose in 
Section [3] and Section [4] to use the arcdistance, or distance along the manifold, 
instead of the Euclidean distance as measure of proximity. We also propose to 
use the Frechet mean and variance, or mean and variance along the manifold, 
as a measure of center and spread in the manifold. 

Fig. H] illustrates the case of simultaneous variation along two modes, vertical 
shift and horizontal shift in the left two panels, horizontal shift and gencralist- 
specialist in the right two panels. We sec in both examples that when there 
are two modes of variation, the points in the point cloud space fall in a plane 
or a surface. We call the surface in each example the space of variation of the 
data. When the space of variation is curved, the mean does not fall in the 
space of variation and the Euclidean metric is not appropriate for proximity. As 
in the one-dimensional case, we propose in Section [3] to use the Frechet mean 
set for a center in the manifold of variation. However, in contrast to the one- 
dimensional case, we do not use the arcdistance or shortest distance along the 
manifold as measure of proximity. Instead, we define in Section [4] new metrics 
by a Pythagorean like formula using the one-dimensional arcdistances along 
each mode. The advantage of these new metrics is that, by construction, we can 
decompose variation in the manifold in an ANOVA type decomposition along 
the modes of interest. As discussed in Section HI the construction of this metric 
is simpler in the linearly separable case than in the general case. 

2.2. General dimension 

For a variation or a set of variation around a common shape, the space sup- 
porting the points in the point cloud space is called the space of variation. In 
the one-mode examples, the line or the curve were the space of variation. In the 
two-modes examples, the surface is the space of variation. In general, given data 
as in model p.2[) with a common shape R(8, t) and parameter space B C K d , 
we define the the space of variation V as the subspace of M. d such that, 

V = {(xi, . . . , x d ) e K d ; 36 e 6 such that Xj = R{9, t,)Vj = l,...,d.} 

Theorem 12 . II states that under certain conditions on the common shape and the 
parameter space, the space of variation is a manifold of the same dimension d! 
as the parameter space 

Theorem 2.1. For a fixed sampling vector t £ K d , let the function i?(.,t) be 
such that 

R(.,t):R d ' -> V 

e i ► R{e,t) 

If R(.,t) is an homeomorphism from M. d to V, then V is a manifold of dimension 

df. 
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See proof in Appendix. For the caterpillar data, this condition is satisfied for 
any polynomial common shape z of degree higher than 2 and the three modes 
of variation parameterized by (w, m, h) in model (1). The metrics dy which we 
define in the manifold of variation V in Section [4] using the arcdistances along 
each mode will additionally require that R(.,t) be differentiable. 

3. Variation in a manifold 

In the previous section, we visualized the space of variation in the case of a 
one-dimensional mode or a two-dimensional simultaneous modes. The geometry 
of the space is non-euclidean in the presence of nonlinear variation, so the usual 
notions of mean and variance are not representative of the center and spread 
of the distribution. We first define in this section an appropriate mean, as a 
measure of center of variation, and variance, as a measure of spread along the 
manifold. The goal is to use these new mean and variance to define a ratio to 
quantify nonlinear modes. 



3.1. Frechet mean and Frechet variance 

It is well known that given a set of data in a nonlinear manifold, the arithmetic 
mean will not necessarily fall in the manifold. Similarly using the Euclidean 
distance as a measure of variation around the mean shape is not appropriate 
in the manifold because this distance is not a good measure of proximity in 
a nonlinear space. These facts have been discussed extensively in the shape 
analysis literatur e, and illustrated with a toy example in functional data in 
Izem et al. (2003). Because the usual definitions of mean and variance in a linear 



space are not meaningful measures of center and spread in a nonlinea r space 



Freche t generalized the notions of mean and variance to manifolds in iFrechet 
( 1948h . The generalization proceeds as follows, for a real random variables X in 
a Euclidean space with measure /x, the expected value E(X) is the point in the 
space which minimizes the variance, i.e. let F(y) = J \ \X — y\ \ 2 d/j,(x), then 

E(X) = argmin yeR (F(y)) and Var(X) = F(E(X)). 

For a metric manifold (M, d) with measure fi, let the Frechet function be, 

F(y) = f d 2 (x,y)dfi(x). 

This function is well defined if J M d 2 (x,y)dfi(x) < oo,Vy € M 
In a metric manifold (M, d), the Frechet mean set Ef{X) of a random variable 
X , in the manifold M, with probability measure /i, is the set of points on the 
manifold which minimize the function F(y). The Frechet variance Varp{X)\s 
the value F(Xp) for any Xp in the Frechet mean set Ep(X). i.e. 

X F e E F (X) iff F(X F ) = in£ yeM F(y). 
Varp(X) = F(X F ) 
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Note that although we can have more than one Frechet mean, we can define 
a unique Frechet variance. Note also that the Frechet variance is a scalar, and 
not a matrix, even for manifolds of dimension d! > 2. In this paper, we are 
concerned with functional data X, so the Frechet mean set is a set of functional 
data. 



3.2. Quantifying the variation 

In our context, let z\, . . . , z n be a functional data set of common shape as in 
model (|1.2|) . where each zt is associated to the regression functions Ri and 
parameter Bi in a closed set 0. Thus, Ri = R(0i,t) is in the metric manifold 
(V, dy) and an intuitive estimate of the Frechet function is 

n 

F n {R) = Y J d v {R i ,R). 

i=l 

We can also derive an estimate of the Frechet mean set Ep, n (R) and an estimate 
of the Frechet variance —SSM, where 

R £ Ep.n(R) & F n (R) = mm ReV F n (R) 

n 

SSM = Y, d v(Ri,R) 

i=l 

We propose to quantify the variation in the manifold by the following ratio, 

RSS= ^S SM (3.1) 

SSM + SSE 

where 

n 

SSE = ^2\\zi- R l \\ 2 

i=l 

Note that for linear modes, if we take dy to be the Euclidean metric, this ratio 
corresponds to the usual ratio of sums of squares used in PCA. Note also that 
the choice of the distance dy is critical, it is the main building block to the 
decomposition we show in this paper. In one dimension, we choose dy to be 
the arcdistance, or the distance along the curved one dimensional manifold. For 
a diffcrcntiable one-dimensional manifold with parametrization 7 : £ 1 h 
7(0) e W\ we have that 

Arcd( 7 (0i) ! 7(^)) = J ' ||^7(0)ll^;V0i,0 2 . (3.2) 
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In higher dimensional spaces of variation, our choice of dy is motivated by our 
decomposition goal, i.e. we define dy such that 

n 

by definition, SSM = ^ d v (Ri,R), (3.3) 

n d' 

by construction d v (R il R) = SSM^ (3-4) 

i=l fc=l 

and ^S5 fe = __f SMk (3.5) 

SSM + SSE 

where RSSk quantifies the variation along mode k. Constructing the distance 
dy is challenging in manifolds since the notion of orthogonality is local whereas 
it is global in linear spaces. We reach this goal by incorporating in dy the 
arcdistances along each mode in a Pythagorean like formula. We first define 
these distances more formally in Section [4] We show in Section [5] that in each 
case, with our choice of distance, the estimates above are consistent estimates of 
the Frechet mean and variance. Moreover, we find that in the one dimensional 
and linearly separable cases the Frechet mean and its estimate (for a given n) are 
unique. Finally, we show in the general case, consistency. Using these estimates 
of Frechet mean and variance, we use the proposed the above ratios RSS^s to 
quantify nonlinear modes in a manifold. 



4. Distance and decomposition of variation in the manifold 

In the one dimensional case, the metric dy is simply the arcdistance and the 
quantification is easy to do as seen in Subsection 14. 11 In the two-dimensional 
case and higher, the geodesic distance does not allow for a straightforward de- 
composition. So we define the metric as a function of the arcdistances along each 
mode. As seen in Subsection l4.2l the expression of this metric is easy in the lin- 
early separable case. The distance is not trivial to generalize to the general, non 
linearly separable, case. We first see how to generalize it in the two-dimensional 
case in Subsection 14.31 Finally, Subsection 14.41 discusses the generalization of 
this metric to higher dimensional spaces of variation. 



4-1. One mode 

A natural choice of distance dy(x,y) between two points x and y in the one- 
dimensional manifold of variation is the arcdistance, Arcd(x,y). We show in 
Section [5] that in the one dimensional case, the ratio RSS defined in Equation 
(|3.1|) is well defined, the Frechet mean is unique, its estimate for a sample of 
points of size n is also unique, and this estimate is consistent. 
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Fig 5 . Example of a space satisfying the equality of path condition and a representation of the 
transformation of the space into R 2 . Left: Representation of the space of variation when the 
two modes are vertical shift (parameterized by h) and horizontal shift (parameterized by m). 
Right: Representation of the transformation of the space of variation. The distances along 
the curves (dashed line type) and the distance along the lines (solid line type) are the same 
as in the original space of variation in the left panel. 

4-2. Multiple simultaneous modes, linearly separable case 

We described in Scction[5]an example of a 2-dimcnsional linearly separable space 
of variation. It is shown in the two left panels of Fig. [4j the two modes of variation 
are the vertical shift and the horizontal shift. We first review the equality of path 
property in Subsection 14.2.11 and define it in the general case. In Subsection l4.2.21 
we use the property to construct the metric and the decomposition of variation. 

4- 2.1. Linear separability and equality of path 

We illustrate the geometry of the space of variation in the two dimensional, 
linearly separable case, in Fig. We see in the left panel of Fig. [5] four points 
{R((rrii, hj),t)ij=x t 2, mi < m^hx < h 2 } in the space of variation. The two 
parallel solid lines correspond to the vertical shift variation for two different 
fixed values of location parameters mi and mi. Similarly, the two parallel dashed 
curves correspond to the horizontal shift variation for two different fixed values of 
the height parameter hi and h 2 . To go from one point i?((mi, hi),t) to another 
i?( (777,2, /12), t), by two steps, along the curves of variation, we have two possible 
paths. The first path goes first along the top dashed curve m £ (7711,7712) 1— > 
R((rn,h\),i), then along the left solid line h <E (ft.1,/12) l— ► R((m,2,h),t). The 
second path goes first along right solid line h S (ft.1,/12) R{{mi,h),t), then 
along the bottom dashed curve m G (7711,7712) 1— > R((m, /12), t). For this special 
case, the two solid line segments and the two dashed curve segments from the two 
different paths are of equal length. This equality of path property is satisfied by 
any manifold generated by a linearly separable model. Namely, the arcdistance 
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between two points along the space of variation if we fix one of the parameters, 
depends only on the other parameter. For our example, we have that 

|i?(m, hi) — R(m, /i2)| 2 = (z(t — m) + h\ — z(t — m) — h-i) 2 = (hi — ft.2) 2 ; VTii, /12, m 

so this distance along the surface of variation between the points (m, hi) and 
(m, /12) depends only on hi and hi and not on m. In addition, we have as in 
Equation 13.21 that 

rm 2 g 

Arcd(i2(mi, h), R(m2, h)) = / II— — z(t — m)\\dm; Vmi, 7712, h 

J mi dm 

Since the derivative -^z does not depend on h, the distance along the surface 
of variation between the points (mi, h) and (m2, h) depends only on mi and 1712 
and not on h. In general, we define a linearly separable model as follows. 

Definition 4.1. A regression function R(6, t) in model HI. 2]) for parameter 9 = 
(Qi , . . . , 6d') in R d is linearly separable if there exists d' differ entiable functions 
Ri, . . . , Rd> such that 

d' 

R{6,t)=Y^Ri{0 i ,t) 

i=l 

The linearly separable model satisfies the property of equality of paths. We 
say that we have equality of paths in a d! dimensional manifold of variation V 
with parameter space if the distance between two points, which share the 
same values in d! — 1 parameters and differ on one parameter, depend only on 
the varying parameter, i.e. for all (0j i)i<j<d', (0j,2)i<j<d' in © and for all j 

Arcd (R(9i : i, . . . , 9j : i, . . . , 6d',i;t),R(9i t i, . . . , 6j, 2 , ■ ■ ■ , 9d',\'i *)) ~ ^s jiU e j:2 ^ 

where Cg j lt g. 2 is a function depending only on the values of 6j t i and 9j t 2 of the 
jth parameter, and not on the values of other parameters (#j,i)i<i<d' ; i/j. 



4-2.2. Distance and decomposition 



Using this property, it is straightforward to define a metric in the manifold 
which will decompose the total variation in the data onto the modes of interest 
using the following metric 

Theorem 4.1. For a d' (d! > 2) differ entiable manifold of variation V generated 
by a linearly separable model the non-negative function dy defined below is a 
distance in V 



,e d ,,i),t),R((e 



1,2; 



d v : V x V 



M,t)) 



\ 
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We show that dy satisfies the conditions of a distance in the Appendix. 
The proof shows that the space of variation for a linearly separable model is 
homeomorphic to a subspace of M. d . Furthermore, the function dy we defined 
above corresponds to the Euclidean metric in this subspace of M. d . The right 
panel of Fig. [5] illustrates this mapping in the case of 6! = 2. This mapping- 
preserves the arcdistances along the curves of the modes of variation, and dy 
corresponds to the Euclidean distance in this mapping. 

Finally, we have a simple decomposition in the linearly separable case. Recall 

that by definition SSM = Y^i=i dy (-^»> By construction from the above 
theorem, we have that 

d' 

d v {Ri,R) = Cf k 
fc=i 

where Cf k is a function of the kth mode. So, finally we have the decomposition 
SSM = SSM k , where SSM k = C lk and RSS k = 



k=l i=l 



SSM + SSE 



By construction, RSSk quantifies the variability around the Frechet mean along 
mode k. 



4-3. Two modes, general case 

We described in Section [2] an example of a 2-dimcnsional general case space of 
variation. We saw in the two right panels of Fig. [4] that when the two modes of 
variation are horizontal shift and generalist-specialist, the space of variation is 
a two-dimensional manifold. The right panel of Fig. [6] is a representation of the 
same space. In this figure, the three parabolic curves correspond to the hori- 
zontal shift variation for three different fixed values of the width parameter and 
the three lines correspond to portions of the curve representing the generalist- 
specialist mode for three different fixed values of the horizontal shift variation. 
As illustrated in Fig. [6j in the general case we do not have the equality of path 
property. More precisely, the paths for going in two steps from one point to 
another in the manifold will not be equal. We define in Subsection 14.3.11 and 
Subscction l4.3.2l two different metrics d\,o and e?2,o by considering two different 
mappings L\ o and £2,0 of the space of variation into a subspace of M 2 . Each 
metric is homeomorphic to the Euclidean distance in the transformed subspace 
of R 2 . We finally combine the two distances in Subsection l4.3.3l to define a metric 
dy in the manifold indexed by the weight 7 as 



dy(x,y) = yJjdi,o(x,y) 2 + (1 - 7)^2,0(2;, y) 2 - 

As discussed in Remark 14.11 we chose 7 to be 1/2 in our analysis. We see in 
Subsection 14.3.41 how we can use this distance for the decomposition of interest. 
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Because each mapping is a different transformation of the manifold into a plane, 
we need to define each transformation with respect to an origin O = i?o,o = 
R((ao, /?o), t) in the space. We defer the discussion on the choice of origin to 
Section [5] 

4-3.1. First transformation 

The first transformation L\ o maps the space of variation V into K 2 by pre- 
serving the arcdistances along the two curves which cross the origin O. The two 
curves which cross at the origin are the dotted curves in Fig.[6j and the image of 
the transformation Li.o is shown in the bottom right panel of Fig. [6] We define 
the distance di t o in V as the Euclidean distance in Li,o{V). More formally, let 
Ri.k = i?((o,-,j9fe),t) for i,k = 0, 1,2, then 

Li,o : V -» M 2 

Ri t i h- > (?/, Q such that 

?/ = sign(ai - a )Arcd(i?i i0 ,i?o,o) 

C = sign(/?i - /3 )Arcd(i? ,i, J Ro,o) 

After transformation, we define the metric di^o between two points in the man- 
ifold as equivalent to the Euclidean distance in the space Li,o(V) such that 

di,o ■■ V x V -> K+ 

(i?l,l,-R2,2) l— > dl,o(-Rl,lj #2,2) 

di,o(-Ri, 1,-^2,2) = I l-^i, o(-Ri, 1) — ^1,0(^2,2)11 

The following Proposition states that this function defines a metric, 

Proposition 4.1. For all origins O, and for a differ entiable manifold V , the 
function di t o defines a metric on V . 

This Proposition is equivalent to showing that L^o is isometric. By using 
simple algebra, we can further simplify the expression of d\.o as a function of 
the parameters of variation rather than the linearizing function as, 

Corollary 4.1. 

d? )0 (#1,1, #2,2) - Arcd 2 (R hQ ,R 2 ,o) + Arcd 2 {R Q , u R Qa ) 

4-3.2. Second transformation 

The second transformation L2.0 maps V into R 2 by preserving the arcdistances 
along the curves which cross the points of interest. For example, the curves 
which cross at point 1 in Fig. [5] are the two dashed curves. Similarly, the two 
curves which cross at point 2 in Fig. [6] are the two dotted and dashed curves. 
The image of the space of variation by this transformation £2,0 is shown in the 
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top right panel of Fig. [6] As for the first transformation, we define a metric in V 
as the Euclidean distance in L2p(V). More formally, the second transformation 
L2,o is defined as follows, 

L 2 p:V -> R 2 

Ri,i l ~ > (v> C) such that 

T) = sign(ai - a )Arcd(i?i4, i? ,i) 

C = sign(/3i -/3 )Arcd( J Ri,i,J?i > o) 

We can then define a function difi which corresponds to the Euclidean distance 
in the linearized space. 

d 2 p :VxV -> R + 
^2,0(^1,15^2,2) = ||i2,o(^i,i) — i2,o(^2,2)|| 

We can rewrite ^2,0 as a function of the parametrization as follows 

<k,o( R h 1^2,2) = V 'A 2 + B 2 , where 

A = sign(ai — ao)Arcd(i?i i i, i?o,i) 

- sign(a 2 - a )Arcd(i? 2 ,2, i?o,2) 
B = signer -/3o)Arcd(i2i,i,i?i,o) 

- sign(/? 2 - /?o)Arcd(i?2, 2 , i? 2 ,o) 

Proposition 4.2. The function dip satisfies the following conditions: (i) dip 
is non-negative, (ii) d 2 p *s symmetric. (Hi) dip satisfies the triangular in- 
equality. Moreover, if the mapping £2,0 is one-to-one, then dip satisfies the 
condition 

d2,o{Rl,l, ^2,2) = = i?2,2 

which, in addition to the above conditions, makes dip a, distance. 

Note that if the space V is linearly separable, then the two functions dip 
and g?2.o are the same, and they are equal to the distance defined in the linearly 
separable case. This result is stated in the following proposition and proved in 
the Appendix 

Proposition 4.3. When V is linearly separable, dy is the distance defined in 
Subsection \4-2.2\ and dip and di_o are as defined above, we have that d\.o = 
<^2,o = dy for all origins O in the manifold. 

4-3.3. Distance in the manifold 

The fact that dip is not always a distance in the general case is not an issue 
since we can combine it with dip to define a distance in the space of variation 
as stated in Proposition 14.41 proved in the Appendix. 
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Proposition 4.4. The non-negative function dv.o.-y defined as 

dv,o,j ■ V x V -> R+ 

(i£i,i,-R2,2) dv,o,y(Ri,i,R2a) such that 



dv,o n ( R i, 1.^2,2) = y 7^1,0(^1.15 ^2,2) + (1 -7)^2,0(^1.1' -R2.2) 

is a metric for all 7 G [0, 1) in a differ entiahle manifold V . The parameter 7 is 
a weight of each distance. 

Remark 4.1. When analyzing the caterpillar data (see results in Section^, 
the distance in the manifold generated by the generalist- specialist and horizontal 
shift corresponded to the equal weights case (i.e. 7 = \) 



dv,o{Ri,i,R2,2) — Y 2^1. 0(^1, i' ^2,2) + 7j ^2,0 O^i.is ^2,2) 

We chose equal weights because there was no a-priori reason why one path should 
be weighted more than the other path. We repeated the decomposition by changing 
the distance from full weight on one distance (7 = 0) to full weight on the other 
distance (7 = 1) and the results of the decomposition were similar. 



4-3.4- Decomposition 

We have finally defined all the tools for our proposed decomposition. Let Ro = 
Rq q 2 be an estimate of a Frechet mean (i.e. in the Frechet mean set) then we 
can see how to decompose the variation with the distance dyo 



SSMo = Y, d vA R ^ R o) 



dv,o{RiAiRo) — -jd\ (Ri t i, Ro) + 2^2,o(^M> Ro) 
dl t0 (Ri,i,R ) = Arcd 2 (R lfi ,Ro 1 .o) + ^cd 2 (R , i ,R o 2 ) 
dl.o{Ris-,Ro) = (sign(o!i - a )Arcd(i? M , i? ,i) 

\ 2 

-sign(Oi - a )Arcd(i? o ,i? Q 2 ) ; 

+ (sign(A - /3o)Axcd(Ri t i,Ri,o) 

- sign(0 2 - /3 )Arcd(^ o , Rd u o) 

The advantage of defining the distance dy,o by d\ t o and c?2,o is that we can 
reorganize the terms of SSMq such that 
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Fig 6. Example of a 2 dimensional space of variation in the general case and representation of 
the corresponding two transformations. Left: Representation of the space of variation when the 
two modes are the horizontal shift and the generalist- specialist. The three points in the space of 
variation correspond to three different location and width parameters. For illustration, one of 
the points is taken to be the origin in the manifold. Top right: Representation of the second 
mapping of the space of variation into R 2 . This transformation preserves the arcdistances 
of curves which cross at the points of interest. Bottom right: Representation of the second 
mapping of the space of variation into M 2 . This transformation preserves the arcdistances of 
curves crossing at the origin. 



SSM = SSM h0 + SSM 2 ,o 
where 



SSM h o = -^2Arcd 2 (R l , Q ,R di a ) + - ^(sign(a.; - a )Arcd(i? M , i? 0:4 ) 



and 



- sign(Oi - a )Arcd(i?o, -R 0) o 2 

SSM 2 ,o = ^Arcd 2 {R Q , u R o d2 ) + -(sign(ft - A,)Arcd(i? M , %,) 

- sign(0 2 - /3 )Arcd(^ o , i?Q i ) 



For a given origin O, the term SSMi q quantifies the variation along the mode 

parameterized by a and the term SSM2.0 quantifies the variation along the 
mode parameterized by (3. 



4-4- Multiple simultaneous modes, general case 



We constructed in Subsection I4.2I a metric and a decomposition for multiple 
dimensional case satisfying the linearly separable case. More generally, when 
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have d! simultaneous modes of variation, we might have the linearly separable 
property satisfied for a subset of parameters but not for all parameters. For 
example, in model we can write R((w, to, h), t) = R\((w, m),t) + R 2 (h, t) 

where Ri((w,m),t) = w x z(w(t — to)) and R2(h,t) = h. We can define a 
distance in this three dimensional manifold as 

d v (R((wi,m 1 , hi), t),R((w 2 , m 2 , h%), t)) 

= y / <(i?i(K,TOi),t),i? 1 ((u; 2 ,TO 2 ),t) + ||/n - /i 2 || 2 

where mi), t), i?i((ui2, ^2), t) is the metric defined for a two-dimen- 

sional space of variation in Subsection 14.31 

Using the same tools we developed in Subsection 14.31 we can generalize the 
above metric and decomposition to a d! dimensional space where the equality 
of path property is not necessarily satisfied by any subset of parameters. In a 
2-dimcnsional manifold, we found two possible mappings of V, each preserving 
one (out of two) possible 2-steps paths between two points along the curves of 
variation. In the general rf'-dimcnsional manifold, the number of mappings is 
the number of possible d'-step paths between two points along the curves of 
variation which is (d'l). For each possible path i, we can define a mapping L;.0; 
which allows us to define a function di } o- We define the metric in the manifold 
to be do with weights 7, associated to each path metric. More precisely, for any 
two points R\ and R2 in the manifold 

d'\ 

d 2 Q (R 1 ,R 2 ) = _]T 7 ,4 o(i?lji?2 ) ; ^ 7l = 1. 

i— 1 i 

Each df q is the sum of d! terms, each depending on one parameter. Thus, we 
can reorganize the sums of square in the previous formula to be the sum of d! 
terms, each depending on one parameter. 

5. Consistency and choice of origin 

We state in this section the main theorems proving the consistency of our es- 
timates with the distances defined in the previous section. We also discuss our 
choice of origin in the space of variation used in the non-separable case. 



5. 1 . Consistency 



Theorem 15.11 states the uniqueness of the Frechet mean in the one dimensional 
case an d the (f -dimensional case linearly sep arable case. We use Theorem 2.3 



from iBhattacharva and Patrangenarul (|2003[ ) to prove consistency of our esti 



mates in the general case in Proposition [57T] 

Theorem 5.1. For i.i.d random variables R\, . . . , R n of measure \x in (V, dy) of 

finite Frechet mean and Frechet variance. For V a one dimensional differ entiable 
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manifold or a d! dimensional differentiable manifold linearly separable. We have 
that the Frechet mean Rf is unique, for all n its estimate R from the data is 
unique and 



1 



dy(R, Rf) 
-SSM - Var F 



(a.s) 
(a.s) 



See proof in the Appendix. S howing consistency in the general case is not 
trivial. We use Theorem 2.3 from lBhattacharva and Patrangenarul (|2003l ) which 
establishes consistency of the Frechet mean estimates for a complete manifold, 
to prove the following corollary 

Proposition 5.1. For all fixed origins O in a metric manifold (V,dv t o) 



-SSM C 
n 



Varp t o a -S 



5.2. Choice of an origin 

We propose to use an origin O = R{{a , flo), t) in a compact set K in V which 
is a minimizer of the Frechet Variance, i.e. 

Var F , 6 = min ^Var F , , where Var F , = f ^(R, R F ^(R h 

and Rf,o is an element of the Frechet mean set. This choice of origin is conser- 
vative, it will guarantee that the estimate of the variance in the manifold will 
not be inflated. A question of interest is, can this variance be estimated from 
the data when this origin is not set in advance? Let F n (0) be the estimate of 
the Frechet variance from the data, i.e. 

1 1 ™ 

F n (0) = -SSM F ,o = -Yd 2 (R l , l ,R n ,o) 
n n * — ' 

i=l 

where i? Mj o is the estimate of the Frechet mean associated to the origin O and 
the data . . . , R n ,n- An estimate of F n {0) from the sample is the minimum 
of F n (0) over a compact set, i.e. the estimate is 

F n {O n ) = min O £KF n (0) 

The following proposition states that the sequence F n (O n ) converges to the 
desired variance function Var^, q . 

Proposition 5.2. The Frechet sample estimate at the sample minimizing origin 
O n converges to the Frechet variance at the minimizing origin O, i.e. F n (O n ) — > 
Var Fd . 

See proof in Appendix. 
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6. Implementation and extensions 

We have discussed in this paper a useful decomposition of the variation in the 
manifold of variation. In the motivated model, the data does not lie in the 
manifold but is projected onto the manifold. More specifically, the data z; £ M. d 
is such that Zi = Ri + ti where Ri is the projection of Zi onto the manifold and 
Ci is additive error. The projection is the point in the manifold which is closest 
to the data in M. d , i.e. which minimizes SSE such that 

H^i - = minsgyll^ - x\\ 

This projection might not be unique but we can use diagnostic plots shown 
in Fig. [7| to detect multiplicity of fits. We can also use local information to 
choose between multiple projections, so that for data which are close together 
in M. d , their projections onto V should be close together. We also suppose in this 
paper that the common shape function is known or could be estimated from the 
data. For the caterpillar example, the common shape function is estimated by a 
polynomial in an iterative procedure. Each iteration includes two steps, in each 
step parameters arc fit to minimize SSE. Given initial parameters of variation, 
the first step finds the optimal values of the parameters of the polynomial. These 
polynomial coefficients are then used in the second step to find the optimal 
values of the parameters of variation. The parameters found in this iteration 
are used as initial parameters of the following iteration and this procedure is 
repeated until the algorithm converges to an optimal solution. This iterative 
procedure could be easily generalized for a common shape in any parametric 
family satisfying the condition of Theorem 12.11 This iterative procedure could 
also be generalized to estimate non-parameterically the com mon shape as well 



as the param eters of variation using algorithms described in iWang and Gasser 
< 1997L Il999h . 



Finding the geodesic distance along one-dimensional manifolds is central to 
our method. This distance could be computed analytically using Equation 13.21 
Another approach based on approximation is to consider the path length be- 
tween two points in a one-dimensional manifold as the sum of lengths of small 
segments along the curve. The length of each small segment is approximated by 
the corresponding Euclidean distances. 



7. Results 



Our TMV analysis has been implemented and used by biologists to decompose 
v ariation along modes of inte rest in thermal performance curve s of caterpillars 



nation alo ng modes oi interest m tnermai periormance curve s oi caterpu 
Izem and Kingsolverl (|20051 ) , and viruses in Knies et all (|2006l ). The fitted 



pa- 
rameters and fitted template polynomial optimized the weighted sums of squared 
error, weighted by the sample sizes of each family. The results of the decompo- 
sition on the motivating caterpillar data are presented in Table Q] Fig. [7] shows 
a visual diagnostic of the fits. 
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The data have only six measurements per curve, as shown in Fig.JTJ Thus, we 
assumed for simplicity that the template shape is a polynomial of degree 4 in the 
results shown in Table [T] and the top two panels of Fig. [71 For identifiability, we 
assumed further that the fitted height parameters sum to zero. The polynomial 
fit to each curve as well as the template shape diagnostic fit are shown in the top 
two panels of Fig. [7j We see in the top left panel of Fig. [Jj that the variation of 
the fitted polynomials closely matches the variation in the original data shown 
in Fig. [TJ The top right panel of Fig. [7J shows the warped data compared to the 
fitted template shape polynomial of degree 4. Each curve i was warped by its 
fitted parameters (m,-, m,, hi) so that all curves could be compared on the same 
scale to the fitted template shape. More precisely, a warped curve i represents the 
warped growth rate (zj — hi)/wi plotted at the warped temperature (t — rhijWi, 
where z,; and t represent the observed growth rate and temperature vectors of 
curve i. We found that this representation is an excellent graphic diagnostic of 
the model fit. Under the assumptions of our model, after warping, we expect 
that the global maxima of all curves is at 0, that the warped curves are aligned 
and that the template shape is lying in the middle of the warped curves. We 
see in the top right panel of Fig.[7]that these expectations are mostly met. One 
of the curves appear to be an outlier, it corresponds to a family with small 
sample size. Although the fitted polynomial lies in the middle of the warped 
curves, the template curve seem to slightly overestimate the growth rate at low 
temperatures, which is due to the limitation of an oversmooth polynomial fit. 
In contrast to this good fit, the two bottom panels of Fig. [7] show the warped 
data compared to two fitted template shapes of a polynomial of degree 6 in the 
left panel and of a polynomial of degree 3 in the right panel. These fits are not 
as good as the polynomial of degree 4 fit because we see that in both panels the 
fitted maxima (at zero for warped data) is outside of the measurement range 
for many curves and the warped curves are not aligned. Because the curves 
are not aligned, a small set of curves contribute to fitting a large range of the 
polynomial of degree 6 (colder temperatures) and different groups of curves fit 
different regions of the template polynomial of degree 3. 

We see in Table Q] that our model explains about 5/6 of the variation in the 
data, which is surprisingly good since all the modes have biological meaning. 
Very little variation is occurring along the vertical shift which suggests that 
selection of caterpillars with high growth rate at a particular temperature in 
one generation, will not result in individuals with high growth rate overall tem- 
peratures in other generations. Our decomposition separates the contribution of 
the generalist-specialist variation (27.11%) from the contribution of the horizon- 
tal shift variation (45.76%), which was not possible using Euclidean methods 
such as PCA and conventional ANOVA. We tested the robustness of our re- 
sults using 500 bootstrap samples, where each bootstrap sample was obtained 
by sampling with replacement the family curves. We fitted the parameters and 
derived the decomposition for each bootstrap sample and we see in Table Q] 
that our decomposition of the variation hold. 
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Fig 7. Top left: Fitted curves to caterpillar data. Fitted curves are transformations of a 
common template by model \T~l\ Top right: Fitted template shape of degree 4, = —4.92 X 
10~ 5 - 2.41 X 10~ 3 X t - 0.032 xt 2 + 2.19 X t 4 , and warped data. The template lies in the 
middle of the rescaled data, it fits the data fairly well. Bottom left: Fitted template shape of 
degree 6 and warped data. Bottom right: Fitted template shape of degree 3 and warped data. 



Table 1 

Decomposition of the Variation and Bootstrap Intervals. Percentages of variation are 
computed according to Eguation 5.1 





vertical shift 


horizontal shift 


gen-spec 


Total-Model 


Result 


11.24 


45.76 


27.11 


84.11 


Bootstrap (mean,sd) 


10.41,5.46 


41.71,7.48 


29.16,8.01 


81.28,7.74 


Bootstrap (median, 
5th, 95th percentiles) 


10.64 
2.87,19.88 


27.86 
27.86,52.81 


29.57 
11.8,40.60 


82.89 
67.47,83.38 
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Appendix A: Appendix 

Theorem 2.1. For a fixed sampling vector t £ M. d , let the function R(.,t) be 
such that 



R(.,t):R d ' -> V 

9 i ► R(0,t) 
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IfR(., t) is an homeomorphism from R d to V, then V is a manifold of dimension 
d'. 

Proof. To show that V is a manifold of dimension d 1 we need to show that it is a 
separable topological space and that every neighborhood Vl in V is homcomor- 
phic to a neighborhood in W l . V is a topological subspace of M. d with topology 
induced by the topology in (R d , ||.||). Since R d is separable, and R(.,t) is an 
homeomorphism then V is also separable. Since R(.,t) is an homeomorphism 
from {R d> , ||.||) to (V, \\.\\), then every nei ghborhoods Q in V is homcomorphic 
to a neighborhood in M. d . □ 

Theorem 4.1. For ad' (d! > 2) differentiate manifold of variation V generated 
by a linearly separable model the non-negative function defined as 

d v : V x V -> R + 

(R((0i,i,. ■ ■ , e d ., 1 ),t),R{{e 1 ,2, 8 d ,, 2 ),t)) v 



d> 



a distance in V 



Proof. We will show this result in the two dimensional case. In dimension d', 
the result would follow by induction using the same inequalities as for the two 
dimensional case. In two dimensions, dy is defined as 

d v : V x V -> M+ 
(R((a 1 ,f3 1 ),t),R((a 2 ,f3 2 ) 7 t)) ^ s/c^+CjJ 2 

where 

C au a 2 = Arcd(i?((ai,/3),t),i?((a 2 ,/3),t))V/3, and 
C PuP2 = Arcd(i?((a,/3 1 ),t),i?((a,/3 2 ),t))Va 

We need to show that (i) dy is non- negative and symmetric, (ii) dy(x,y) = 
iff x = y, and (Hi) dy satisfies the triangular inequality. Property (i) is satisfied 
by definition. Let us show property (ii): if dy (i?((ai,/?i),t), R((a 2 , (3 2 ),t)) = 0, 
then 

C ai ,a 2 = C/3i,/3 2 = 0j So, . 



Arcd(i?((ai,/3),t), J R((a 2 ,^),t)) = 0, and 
Arcd (R((a, fa), t), R((a, (3 2 ), t)) = 0, Va, V/3 

Since Arcd is a distance, this implies that /?), t) = R((a 2 , f3), t) for all 

/3, and t) = R((a, f3 2 ), t) for all a. Since R is an homeomorphism, 

these two equalities imply that a\ = a 2l and /3i = /3 2 . So, i?((ai, /3i), t) = 
i?((a 2 , /3 2 ), t). Let us show property (Hi), the triangular inequality. Let Ri = 
R((a>i,f3i), t), we can show that 



d v (Ri,Ra) < d v (R 1 ,R 2 ) + dy(R 2 ,R 3 ) 
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is equivalent to 

d 2 v {Ri 7 R 3 ) < d 2 y{R l7 R 2 ) + d 2 v (R 2 ,R 3 ) + 2 x d v (R u R 2 )dy{R 2 ,R 3 ). (A.l) 
We have that, 

L.H.S of JAU) = C 2 aua3 + C 2 Pu03 (by definition) 

+ C PuPz + C %,03 + 2 x Cft aC/3 2 ,/3 3 ( b / c Arcd is a metric) 
< d v (Ri, R 2 ) + d v (R 2 , R3) + 2 x Cq 1iQ2 C Q2iQ3 + 2 x Cp 1 .p 2 C0 2 ^ 3 

Then, to show (|A.1[) . it is sufficient to show that 

Cai,a 2 Ca2,as3 -\-Cp l fi 3 Cfa$ 3 < dy(R\, R 2 )dy(R 2 , R3) 

which is equivalent to 

Cai, Qj^Los + ^Si.fts^fti.fts + 2 X C Q1iQ2 C Q2iQ3 C/3 1 ,/3 2 C/3 2i( 33 

< d v (Ri, R2)d v (R2, R3) 
which is equivalent to 

2 x C ai , a2 C a2 ^ a3 Cf3 lt f3 2 Cf3 2 j] 3 < C 2 l 0l2 Cp 2 p 3 + C 2 2 a3 Cp x p 2 
which is equivalent to 

< {C ai , a . 2 C f) 2 ^p 3 — C a2 ^ a3 Ci3 1 ,i3 2 ) 2 

Since the last inequality is always true, dy satisfies the triangular inequality. 
From (i), (ii), and (iii) we have that dy is a metric in V. □ 

Proposition 4.2. When V is linearly separable, dy is the distance defined in 
Subsection \4-2.2\ and d\^o and g?2,o ar ^ as defined in Section^ we have that 
di,o = d 2j o = dy for all origins O in the manifold. 

Proof. We first show the equality for gJ^o, then for d 2 ,o- When V is linearly 
separable, we have by Corollary 14. II that 

d\ Q (R hl , R 2 , 2 ) = Arcd 2 (R 1>0 , Rt,o) + Arcd 2 {R ,i, Ro,i) 

By definition of the linearly separable space, the first term Arcd 2 (Ri t o, R2,o) is 
equal to C„ Q2 using the notation of Subsection 14.21 and the distance does not 
depend on the origin O. Similarly, the second term Arcd 2 (Ro,i, Ro,2) is equal 
to ^ using the notation of Subsection 14.21 and the distance does not depend 
on the origin O. Thus, d\,o is equal to the distance dy defined in Theorem 
14.11 The formula for d 2 ^o{Ri,i,R2,2) given in Subsection 14.3.21 simplifies in the 
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linearly separable case to 



d 2 ,o{Ri,i,R2,2) = VA 2 +B 2 , where 

A = sign(ax - a )C aiiOt0 - sign(a 2 - a )C a2 , ao 

= Cqi,q 2 ! and 

B = sign(/3i - (3o)Cp ul3o - sign(/3 2 - /?o)Cft,/3o 

= C/3i,/3 2 ; 



Thus, c?2,o is equal to the distance dy defined in Theorem 14. II □ 
Proposition 4.4. The non-negative function dy.o.'y defined as 

dv,o,j ■ V x V -> R+ 

(Rl,l, i?2,2) >— > 6^0,7(^1,1,-^2,2) 

smc/i £/ia£ 



^,0,7(^1,1,^2,2) = yjd 2 (R 1A ,R 2:2 ) + (1 - 7)^2,0(^1.1,^2,2) 

is a metric for all j £ [0,1) in a differ entiable manifold V 

Proof. We prove this proposition in the general case that d is a distance where 
d = \J^d\ + (1 — 7)<f|, <ii is a distance and c?2 satisfies properties (i) to (m) 
in Proposition 14.21 We need to show that d is (i) positive and symmetric, (ii) 
d(x, y) = iff x = y, and fra) <i satisfies the triangular inequality (i) is satisfied 
by definition. Let us show (ii), if d{x, y) = 0, then d\{x, y) = and d 2 (x, y) = 
which implies that x — y since <ii is a distance. Let us show (Hi), the triangular 
inequality. We need to prove that 

d(x, y) < d(x, z) + d{z, y) 

or equivalcntly that 

d 2 (x, y) < d 2 (x, z) + d 2 (z, y) + 2d(x, z)d(z, y) (A.2) 
Let us consider the left hand side of the inequality 

d 2 (x,y) = ■yd 2 1 (x,y) + (1 -j)dl(x,y) 
< ldl(x,z) +jdl(z,y) 

+2-fdi{x, z)di{z, y) + (1 - ^)d\{x, z) 

+ (1 - -f)d 2 (z, y) + 2(1 - j)d 2 (x, z)d 2 (z, y) 

d 2 (x,y) < d 2 (x,z) + d 2 (z,y) 

+2~{di(x, z)dx{z, y) + 2(1 - -f)d 2 {x, z)d 2 (z, y) 

To prove the inequality (|A.2|) , it is sufficient to prove that 

7<ii(x, z)d\(z, y) + (1 - j)d 2 {x, z)d 2 (z, y) < d(x, z)d(z, y) (A. 3) 
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We will reason by equivalence, inequality (|A.2|) is equivalent to 

{ r )d 1 (x, z)dx(z, y) + (l- j)d 2 (x, z)d 2 (z, y)f < d 2 (x, z)d 2 (z, y) (A. 4) 
On one hand, 

L.H.S of inequality (|A.4j) = ^ 2 d\{x, z)d\(z, y) + (1 - i) 2 S 2 (x, z)d 2 2 {z, y) 

+27(1 - 7)di(x, z)di(z, y)d 2 {x, z)d 2 (z, y) 

On the other hand, 

R.H.S of inequality (|A.4j) = (-fdl(x, z) + (1 - 7)d|(x, z)) 

x ( 7 d?(^ y) + (l- 7)4(2, y)) 

= 7 2 ^, *)<*?(*. V) + (1 - 7) 2 4(z, z)d§(z, y) 
+7(1 - 7) (^(x, «)c^(z, y) + d 2 (z, y)d\(x, z)) 

After canceling the common terms from the left and the right hand side of 
inequality (|A.4[) . we have the inequality 

< 7(1 - 7) {d\(x, z)d 2 (z, y) + di(x, z)d 2 (z, y)) 2 

Since this last inequality is always true, then the triangular inequality is always 
true. Since the function d satisfies (i), (ii), and (Hi) then it is a metric in M. □ 

Theorem 5.1. For i.i.d random variables Ri, . . . , R n of measure \x in (V, dy) of 

finite Frechet mean and Frechet variance, and V one dimensional differ entiable 
manifold or d! dimensional dijferentiable linearly separable manifold. We have 
that the Frechet mean Rp is unique, its estimate from the data R is unique for 
all n and 



dy(R, Rf) -> (a.s) 
-> (a.s) 



-SSM - Var F 
n 



Proof. To show the uniqueness and convergence in the one dimensional case, 
we first map the manifold into K using an isometry L and then we use known 
consistency of the mean results in R to prove consistency in M. Let C be the 
class of functions, L = {Lg, 8 G 0} such that for all #0 in © 

Lg : V -> R 
x = R(6,t) ^ L 6a {x) 
such that Lg Q {x) = sign(6» - 6> )Arcd(i?(6>, t), R(6 , t)) 

For a given parameter 0q, we can consider R(9q, t) as the origin in the manifold 
V by the transformation Lg because by definition Lg (R(9o, t)) = 0. For all 6*o, 
Lg satisfies the isometry property, i.e. 

\Lg (Ri) - Lg a {R 2 )\ = Arcd(i? 1 ,i? 2 ), 
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and the distance between two points Lg (Ri) and Lg (R 2 ) does not depend on 
the origin. By definition of the space of variation and by construction of the 
transforming function Lg, this function is a homeomorphism. i.e. 

(i) Continuity of Lg Q : If Arcd(x„, x) converges to 0, then by isometry \L$ (x n ) — 
L{x)\ converges to 0. So, Lg Q is continuous. 

(ii) Invertibility of Lg : We need to show that Lg (x\) = Lg (x 2 ) => x\ — X2- 
By isometry the left hand side implies that Arcd(xi ,x 2 ) = 0, which implies 
that x\ = x 2 (because Arcd is a metric in V). 

(ii) Continuity of the inverse of Lg : We need to show that 

\Lg {x n ) - Lg (x)\ — ► =>• Arcd(x„,a;) — » 0. 

This property follows directly from the isometry. 

Moreover, L is such that L$ 1 (x) = Lg (x) +sign(#i — 6 )Lg 1 (8 ) Using L, let us 
show that the Frcchct mean is unique. We have that 



E(L 9o (R)) = Argmin^u J (Lg (R) - : 



x) 2 d/i 

Let Rq = L g ~ 1 (E(Lg (Ri))), we can show that Rq is the Frechet mean. Let F(R) 
be the Frechet function, then 

VR ± R 0} F{R) > J (Lg (Ri) - E(Lg {R))) 2 dfi, and by isometry, 

> J Arcd 2 {R l ,R )dfi 

So, Ro is the only element in the Frechet mean set. We can repeat the same 
argument to show uniqueness of the Frechet sample mean R and the empirical 
measure. 

Since by the law of large numbers, 

Li -> E(Lg (R))(a. S ) (A.5) 

then by isometry of Lg , we have that 

d v {R,R F ) -> (a.s) 

Similarly, since (Lg a (Ri) — E(Lg (Ri))) 2 are i.i.d and of finite mean Varp, then 
by law of large numbers 



- V {Lg Q (Ri) - EiLg^Ri))) 2 - Var^ 
n *• — ' 



(a.s) (A.6) 



On the other hand, 

-SSM-V&r F - (- Y^iLg^R,) ~ EiLg^Rijjf ^V & r F )+(L7 -E(L 9o (R))) 2 

(A.7) 
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From equality (|A.7[) . and using the two convergence results in (|A.6|) and (|A.5j) . 
we have that 

-'SSM-V&if ^0 (a.s) 
n 

In the dl linearly separable case, as in the onc-dimensional case, we define a map- 
pings Ltg 1 0t ... t e d i ) G jC (associated with origins parameterized by (#i,o, ■ ■ ■ , #d',o))> 
which transform the nonlinear space to a Euclidean space, i.e. 



J d' ,QJ 



x = R{{6 ll ...,6 d ,),t) i ► L {6l ^...j dlfi )(x) 

i (ei,o,...,e (J /,o)( x ) = (sign(fi,o - #i) x C 8l ,e li0 ,- 
sign(0 d /, o -0d')x C 9d , ,e d 

For all (0i.o, ■ ■ ■ , 6d',o)> , ) is an homeomorphism from (V, dy) to K d . 
The proof in this case is equivalent to the proof in the one-dimensional case. 
Since dy is a metric and L(g 1 0) ... i e , ) satisfies the isometry property, then we 
can show that it is continuous, invcrtible and the inverse is continuous. Similarly, 
the proof of uniqueness of Frcchct mean estimate and Frcchct mean is equivalent 
to the one dimensional case. □ 



Corollary 5.1. For all fixed origins O in a metric manifold (V,dv,o) 

—SSMo — ► Varpoo-.s 
n 

Proof. Let F n (x) be the estimate of the Frcchct function from the sample, i.e. 



F n (x) = -Y^dyiR^x) 

i=l 

Let x n be a minimizer of F n , i.e. 

F n (x„) = mm xeV F n (x) 

To use Theorem 2.3 of (Bhattacharya & Patrangenaru (2003)), we need to show 
that (V, dy) is a complete metric space and all bounded closed set is compact. 

1. Completeness: Let x n , and x rn two points in V, then by definition of dy 

d v (x ni x m ) = \Jjdl(x n ,x m ) + (1 - j)dl(x n ,x m ) 

So, if (x n ) is a cauchy sequence in (V,dy), it is cauchy in (V, d%) and 
{V,d 2 ). We know by construction of d\ and d 2 , (V,di) and (V, d 2 ) are 
both homeomorphic to (R 2 ,||.||) (where ||.|| denotes the norm derived 
from the Euclidean metric). By this homeomorphism, a cauchy sequence 
(x n ) in {V,d±) converges to ai in V. Similarly, a cauchy sequence (x n ) in 
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(V,d-z) converges to a 2 in V. Let us finally show that a\ = 0,2- Since V is 
a subspace of M. d , we have that 

\\x n -ai\\ < d\{x n ,ai) (A. 8) 

(b/c the Euclidean distance is the shortest distance), so (x n ) converges to 
a\ in W 1 . Similarly, we have that 

\\x n - a 2 \\ < d 2 {x ni a 2 ) (A. 9) 

(b/c the Euclidean distance is the shortest distance), so (x n ) converges to 
a,2 in IR^ . By equations (|A.8[) and (|A.9[) . and using the triangular inequal- 
ity of the Euclidean distance, we have that 

IK - «2 1 1 = 

So, eti = a 2 . Then, dy(x n ,a\) converges, so (x n ) converges. 
2. Compact sets in (V, dy). Similarly, using the homeomorphism, a closed 
and bounded set A in (V, dy) is closed and bounded in (V, d\) and (V, d 2 ). 
By the homeomorphisms to (R 2 , ||.||), the space A is compact in (V,d\) 
and (V, d%), so it is compact in (V, dy). 

So, using Theorem 2.3 of (Bhattacharya and Patrangcnaru (2003)), we have 
that 

(x n ) -> R F (a.s) 

where Rp is a Frechet mean. On the other hand, we have that 

F n (R F ) -> Var F 

Since 

\F n (x n ) - Var F \ < \F n (x n ) - F n {R F )\ + \F n (R F ) - Var F \ 
We have that 

F n (x n ) — > Vslt f (a.s) □ 

Proposition 5.1. The Frechet sample estimate at the sample minimizing origin 
O n converges to the Frechet variance at the minimizing origin O, i.e. F n (O n ) — > 
Var Fd . 

Proof. Let F(0) = Varpo Then, by the property of minimums F n (O n ) and 
F(0) and continuity of F n and i 7, we have that, 

First, Ve,3(a,/3) s.t. 
F„(a)-e/2< F n (O n ), and 
FG9) - e/2 < f{6). 
Second, 

F n {O n ) < F n (f3) because O n is a minimizer of i 7 ^, and 

F(O) < F{ a ) because O is a minimizer of i* 1 . 
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In addition, from Proposition [5J3 we know that F n converges to F a.s. Thus, 

3N s.t. Vn > N, F n (a) - F(a) > -e/2, and 
F n (f3)-F((3) <e/2. 

Finally, by combining all the inequalities we have that 

Ve, 3N, s.t. Vn > 2V, -e < F„(O n ) - F(O) < e. 

which proves the convergence. □ 
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